Long-Short Strategy, Part 3: Evaluating our Boosting Model Signals

In this section, we'll start designing, implementing, and evaluating a trading strategy for US equities driven by daily return forecasts produced by gradient boosting models.

As in the previous examples, we'll lay out a framework and build a specific example that you can adapt to run your own experiments. There are numerous aspects that you can vary, from the asset class and investment universe to more granular aspects like the features, holding period, or trading rules. See, for example, the Alpha Factor Library in the Appendix for numerous additional features.

We'll keep the trading strategy simple and only use a single ML signal; a real-life application will likely use multiple signals from different sources, such as complementary ML models trained on different datasets or with different lookahead or lookback periods. It would also use sophisticated risk management, from simple stop-loss to value-at-risk analysis.

Six notebooks cover our workflow sequence:

  1. preparing_the_model_data: we engineer a few simple features from the Quandl Wiki data
  2. trading_signals_with_lightgbm_and_catboost: we tune hyperparameters for LightGBM and CatBoost to select a model, using 2015/16 as our validation period.
  3. evaluate_trading_signals (this noteboook): we compare the cross-validation performance using various metrics to select the best model.
  4. model_interpretation: we take a closer look at the drivers behind the best model's predictions.
  5. making_out_of_sample_predictions: we generate predictions for our out-of-sample test period 2017.
  6. backtesting_with_zipline: evaluate the historical performance of a long-short strategy based on our predictive signals using Zipline.

Cross-validation of numerous configurations has produced a large number of results. Now, we need to evaluate the predictive performance to identify the model that generates the most reliable and profitable signals for our prospective trading strategy.

Imports & Settings

Collect Data

We produced a larger number of LightGBM models because it runs an order of magnitude faster than CatBoost and will demonstrate some evaluation strategies accordingly.

LightGBM

Summary Metrics by Fold

First, we collect the summary metrics computed for each fold and hyperparameter combination:

Information Coefficient by Day

Next, we retrieve the IC per day computed during cross-validation:

CatBoost

We proceed similarly for CatBoost:

Summary Metrics

Daily Information Coefficient

Validation Performance: Daily vs Overall Information Coefficient

The following image shows that that LightGBM (in orange) performs (slightly) better than CatBoost, especially for longer horizons. This is not an entirely fair comparison because we ran more configurations for LightGBM, which also, unsurprisingly, shows a wider dispersion of outcomes:

HyperParameter Impact: Linear Regression

Next, we'd like to understand if there's a systematic, statistical relationship between the hyperparameters and the outcomes across daily predictions. To this end, we will run a linear regression using the various LightGBM hyperparameter settings as dummy variables and the daily validation IC as the outcome.

The below chart shows the coefficient estimates and their confidence intervals for 1- and 21-day forecast horizons.

Note that these results apply to this specific example only.

Cross-validation Result: Best Hyperparameters

LightGBM

The top-performing LightGBM models use the following parameters for the three different prediction horizons.

CatBoost

Visualization

LightGBM

CatBoost

Some figures are empty because we did not run those parameter combinations.

AlphaLens Analysis - Validation Performance

LightGBM

Select Parameters

Plot rolling IC

Get Predictions for Validation Period

We retrieve the predictions for the 10 validation runs:

Get Trade Prices

Using next available prices.

We average the top five models and provide the corresponding prices to Alphalens, in order to compute the mean period-wise return earned on an equal-weighted portfolio invested in the daily factor quintiles for various holding periods:

Create AlphaLens Inputs

Compute Alphalens metrics

Summary Tearsheet

CatBoost

Select Parameters

Get Predictions

Get Trade Prices

Using next available prices.

Create AlphaLens Inputs

Summary Tearsheet